Modeling non-stationary opponents

نویسندگان

  • Pablo Hernandez-Leal
  • Enrique Munoz de Cote
  • Luis Enrique Sucar
چکیده

This paper studies repeated interactions between an agent and an unknown opponent that changes its strategy over time. We propose a framework for learning switching nonstationary strategies. The approach uses decision trees to learn the most up to date opponent’s strategy. Then, the agent’s strategy is computed by transforming the tree into a Markov Decision Process (MDP), whose solution dictates the optimal way of playing against the learned strategy. The agent’s learnt model is continuously re-evaluated to assess strategy switches. Our method detects such strategy switches by measuring tree similarities, and reveals whether the opponent has changed its strategy and a new model has to be learned. We evaluated the proposed approach in the iterated prisoner’s dilemma, outperforming common strategies against stationary and non-stationary opponents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opponent Modeling against Non-stationary Strategies: (Doctoral Consortium)

Most state of the art learning algorithms do not fare well with agents (computer or humans) that change their behaviour in time. This is the case because they usually do not model the other agents’ behaviour and instead make some assumptions that for real scenarios are too restrictive. Furthermore, considering that many applications demand different types of agents to work together this should ...

متن کامل

Online Learning in Stochastic Games and Markov Decision Processes

In their standard formulations, stochastic games and Markov decision processes assume a rational opponent or a stationary environment. Online learning algorithms can adapt to arbitrary opponents and non-stationary environments, but do not incorporate the dynamic structure of stochastic games or Markov decision processes. We survey recent approaches that apply online learning to dynamic environm...

متن کامل

Learning Against Non-Stationary Opponents in Double Auctions

Energy markets are emerging around the world. In this context, the PowerTAC competition has gained attention for being a realistic and powerful simulation platform that can be used to perform robust research on retail energy markets. Agent in this complex environment typically use different strategies throughout their interaction, changing from one to another depending on diverse factors, for e...

متن کامل

Unifying Convergence and No-Regret in Multiagent Learning

We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-...

متن کامل

Using a Priori Information for Fast Learning Against Non-stationary Opponents

For an agent to be successful in interacting against many different and unknown types of opponents it should excel at learning fast a model of the opponent and adapt online to non-stationary (changing) strategies. Recent works have tackled this problem by continuously learning models of the opponent while checking for switches in the opponent strategy. However, these approaches fail to use a pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013